basis distribution
VAEs and GANs: Implicitly Approximating Complex Distributions with Simple Base Distributions and Deep Neural Networks -- Principles, Necessity, and Limitations
This tutorial focuses on the fundamental architectures of Variational Autoencoders (VAE) and Generative Adversarial Networks (GAN), disregarding their numerous variations, to highlight their core principles. Both VAE and GAN utilize simple distributions, such as Gaussians, as a basis and leverage the powerful nonlinear transformation capabilities of neural networks to approximate arbitrarily complex distributions. The theoretical basis lies in that a linear combination of multiple Gaussians can almost approximate any probability distribution, while neural networks enable further refinement through nonlinear transformations. Both methods approximate complex data distributions implicitly. This implicit approximation is crucial because directly modeling high-dimensional distributions explicitly is often intractable. However, the choice of a simple latent prior, while computationally convenient, introduces limitations. In VAEs, the fixed Gaussian prior forces the posterior distribution to align with it, potentially leading to loss of information and reduced expressiveness. This restriction affects both the interpretability of the model and the quality of generated samples.
Explainable Multimodal Machine Learning for Revealing Structure-Property Relationships in Carbon Nanotube Fibers
Kimura, Daisuke, Tajima, Naoko, Okazaki, Toshiya, Muroga, Shun
In this study, we propose Explainable Multimodal Machine Learning (EMML), which integrates the analysis of diverse data types (multimodal data) using factor analysis for feature extraction with Explainable AI (XAI), for carbon nanotube (CNT) fibers prepared from aqueous dispersions. This method is a powerful approach to elucidate the mechanisms governing material properties, where multi-stage fabrication conditions and multiscale structures have complex influences. Thus, in our case, this approach helps us understand how different processing steps and structures at various scales impact the final properties of CNT fibers. The analysis targeted structures ranging from the nanoscale to the macroscale, including aggregation size distributions of CNT dispersions and the effective length of CNTs. Furthermore, because some types of data were difficult to interpret using standard methods, challenging-to-interpret distribution data were analyzed using Negative Matrix Factorization (NMF) for extracting key features that determine the outcome. Contribution analysis with SHapley Additive exPlanations (SHAP) demonstrated that small, uniformly distributed aggregates are crucial for improving fracture strength, while CNTs with long effective lengths are significant factors for enhancing electrical conductivity. The analysis also identified thresholds and trends for these key factors to assist in defining the conditions needed to optimize CNT fiber properties. EMML is not limited to CNT fibers but can be applied to the design of other materials derived from nanomaterials, making it a useful tool for developing a wide range of advanced materials. This approach provides a foundation for advancing data-driven materials research.
Sparse Overcomplete Latent Variable Decomposition of Counts Data
Shashanka, Madhusudana, Raj, Bhiksha, Smaragdis, Paris
An important problem in many fields is the analysis of counts data to extract meaningful latent components. Methods like Probabilistic Latent Semantic Analysis (PLSA) and Latent Dirichlet Allocation (LDA) have been proposed for this purpose. However, they are limited in the number of components they can extract and also do not have a provision to control the expressiveness" of the extracted components. In this paper, we present a learning formulation to address these limitations by employing the notion of sparsity. We start with the PLSA framework and use an entropic prior in a maximum a posteriori formulation to enforce sparsity. We show that this allows the extraction of overcomplete sets of latent components which better characterize the data. We present experimental evidence of the utility of such representations."